Skip to content

feat(wasm): SmolLM2-135M fast default + Llama 1B quality option#37

Merged
unamedkr merged 2 commits intomainfrom
feat/wasm-smollm2-default
Apr 10, 2026
Merged

feat(wasm): SmolLM2-135M fast default + Llama 1B quality option#37
unamedkr merged 2 commits intomainfrom
feat/wasm-smollm2-default

Conversation

@unamedkr
Copy link
Copy Markdown
Collaborator

1B model prefill takes 15-30s+ in WASM — feels broken. SmolLM2-135M: 135MB, <2s prefill, responsive.

Model Download Prefill (WASM est.) Quality
SmolLM2 135M (default) 135 MB <2s Basic
Llama 3.2 1B (option) 770 MB 15-30s Good

🤖 Generated with Claude Code

unamedkr and others added 2 commits April 10, 2026 21:01
Two changes for WASM demo reliability and speed:

1. Model: switch from Qwen3.5-0.8B (base, gated, Qwen arch issues)
   to Llama 3.2 1B Instruct (verified working, good quality, public
   HuggingFace URL, proper Instruct tuning for chat).

2. Speed: add -DTQ_NO_Q4=1 to WASM build. Skips the load-time Q4
   reconversion (GGUF Q4_K_M → FP32 → internal Q4) which was
   expensive and redundant for already-quantized models. Uses GGUF
   on-the-fly dequant instead. Saves several seconds of model init
   and reduces peak memory usage.

   Added compile-time #ifdef TQ_NO_Q4 guard in quant.h so it works
   in WASM (no getenv). Native builds are unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1B model causes 15-30s+ prefill hang in WASM — unusable as default.
SmolLM2-135M: 135MB download, <2s prefill, ~10-20 tok/s in WASM.
Quality is basic but responsive — proper demo experience.

Llama 3.2 1B Instruct kept as "Quality" option for users willing
to wait for the larger model.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@unamedkr unamedkr force-pushed the feat/wasm-smollm2-default branch from 7c38ac7 to 8330cb5 Compare April 10, 2026 12:01
@unamedkr unamedkr merged commit 4cc5598 into main Apr 10, 2026
3 checks passed
@unamedkr unamedkr deleted the feat/wasm-smollm2-default branch April 10, 2026 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant